152 research outputs found
Modeling Ambiguity in a Multi-Agent System
This paper investigates the formal pragmatics of ambiguous expressions by
modeling ambiguity in a multi-agent system. Such a framework allows us to give
a more refined notion of the kind of information that is conveyed by ambiguous
expressions. We analyze how ambiguity affects the knowledge of the dialog
participants and, especially, what they know about each other after an
ambiguous sentence has been uttered. The agents communicate with each other by
means of a TELL-function, whose application is constrained by an implementation
of some of Grice's maxims. The information states of the multi-agent system
itself are represented as a Kripke structures and TELL is an update function on
those structures. This framework enables us to distinguish between the
information conveyed by ambiguous sentences vs. the information conveyed by
disjunctions, and between semantic ambiguity vs. perceived ambiguity.Comment: 7 page
What does Attention in Neural Machine Translation Pay Attention to?
Attention in neural machine translation provides the possibility to encode
relevant parts of the source sentence at each translation step. As a result,
attention is considered to be an alignment model as well. However, there is no
work that specifically studies attention and provides analysis of what is being
learned by attention models. Thus, the question still remains that how
attention is similar or different from the traditional alignment. In this
paper, we provide detailed analysis of attention and compare it to traditional
alignment. We answer the question of whether attention is only capable of
modelling translational equivalent or it captures more information. We show
that attention is different from alignment in some cases and is capturing
useful information other than alignments.Comment: To appear in IJCNLP 201
Computing Presuppositions by Contextual Reasoning
This paper describes how automated deduction methods for natural language
processing can be applied more efficiently by encoding context in a more
elaborate way. Our work is based on formal approaches to context, and we
provide a tableau calculus for contextual reasoning. This is explained by
considering an example from the problem area of presupposition projection.Comment: 5 page
Recurrent Memory Networks for Language Modeling
Recurrent Neural Networks (RNN) have obtained excellent result in many
natural language processing (NLP) tasks. However, understanding and
interpreting the source of this success remains a challenge. In this paper, we
propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only
amplifies the power of RNN but also facilitates our understanding of its
internal functioning and allows us to discover underlying patterns in data. We
demonstrate the power of RMN on language modeling and sentence completion
tasks. On language modeling, RMN outperforms Long Short-Term Memory (LSTM)
network on three large German, Italian, and English dataset. Additionally we
perform in-depth analysis of various linguistic dimensions that RMN captures.
On Sentence Completion Challenge, for which it is essential to capture sentence
coherence, our RMN obtains 69.2% accuracy, surpassing the previous
state-of-the-art by a large margin.Comment: 8 pages, 6 figures. Accepted at NAACL 201
Data Augmentation for Low-Resource Neural Machine Translation
The quality of a Neural Machine Translation system depends substantially on
the availability of sizable parallel corpora. For low-resource language pairs
this is not the case, resulting in poor translation quality. Inspired by work
in computer vision, we propose a novel data augmentation approach that targets
low-frequency words by generating new sentence pairs containing rare words in
new, synthetically created contexts. Experimental results on simulated
low-resource settings show that our method improves translation quality by up
to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.Comment: 5 pages, 2 figures, Accepted at ACL 201
Learning Topic-Sensitive Word Representations
Distributed word representations are widely used for modeling words in NLP
tasks. Most of the existing models generate one representation per word and do
not consider different meanings of a word. We present two approaches to learn
multiple topic-sensitive representations per word by using Hierarchical
Dirichlet Process. We observe that by modeling topics and integrating topic
distributions for each document we obtain representations that are able to
distinguish between different meanings of a given word. Our models yield
statistically significant improvements for the lexical substitution task
indicating that commonly used single word representations, even when combined
with contextual information, are insufficient for this task.Comment: 5 pages, 1 figure, Accepted at ACL 201
Examining the Tip of the Iceberg: A Data Set for Idiom Translation
Neural Machine Translation (NMT) has been widely used in recent years with
significant improvements for many language pairs. Although state-of-the-art NMT
systems are generating progressively better translations, idiom translation
remains one of the open challenges in this field. Idioms, a category of
multiword expressions, are an interesting language phenomenon where the overall
meaning of the expression cannot be composed from the meanings of its parts. A
first important challenge is the lack of dedicated data sets for learning and
evaluating idiom translation. In this paper we address this problem by creating
the first large-scale data set for idiom translation. Our data set is
automatically extracted from a widely used German-English translation corpus
and includes, for each language direction, a targeted evaluation set where all
sentences contain idioms and a regular training corpus where sentences
including idioms are marked. We release this data set and use it to perform
preliminary NMT experiments as the first step towards better idiom translation.Comment: Accepted at LREC 201
Towards a Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance
Multilingual Neural Machine Translation (MNMT) facilitates knowledge sharing
but often suffers from poor zero-shot (ZS) translation qualities. While prior
work has explored the causes of overall low ZS performance, our work introduces
a fresh perspective: the presence of high variations in ZS performance. This
suggests that MNMT does not uniformly exhibit poor ZS capability; instead,
certain translation directions yield reasonable results. Through systematic
experimentation involving 1,560 language directions spanning 40 languages, we
identify three key factors contributing to high variations in ZS NMT
performance: 1) target side translation capability 2) vocabulary overlap 3)
linguistic properties. Our findings highlight that the target side translation
quality is the most influential factor, with vocabulary overlap consistently
impacting ZS performance. Additionally, linguistic properties, such as language
family and writing system, play a role, particularly with smaller models.
Furthermore, we suggest that the off-target issue is a symptom of inadequate ZS
performance, emphasizing that zero-shot translation challenges extend beyond
addressing the off-target problem. We release the data and models serving as a
benchmark to study zero-shot for future research at
https://github.com/Smu-Tan/ZS-NMT-VariationsComment: This paper is accepted by the EMNLP 2023 Main Conferenc
Optimizing Transformer for Low-Resource Neural Machine Translation
Language pairs with limited amounts of parallel data, also known as
low-resource languages, remain a challenge for neural machine translation.
While the Transformer model has achieved significant improvements for many
language pairs and has become the de facto mainstream architecture, its
capability under low-resource conditions has not been fully investigated yet.
Our experiments on different subsets of the IWSLT14 training data show that the
effectiveness of Transformer under low-resource conditions is highly dependent
on the hyper-parameter settings. Our experiments show that using an optimized
Transformer for low-resource conditions improves the translation quality up to
7.3 BLEU points compared to using the Transformer default settings.Comment: To be published in COLING 202
- …